Goto

Collaborating Authors

 mutual exclusivity



Mutual exclusivity as a challenge for deep neural networks

Neural Information Processing Systems

Strong inductive biases allow children to learn in fast and adaptable ways. Children use the mutual exclusivity (ME) bias to help disambiguate how words map to referents, assuming that if an object has one label then it does not need another. In this paper, we investigate whether or not vanilla neural architectures have an ME bias, demonstrating that they lack this learning assumption. Moreover, we show that their inductive biases are poorly matched to lifelong learning formulations of classification and translation. We demonstrate that there is a compelling case for designing task-general neural networks that learn through mutual exclusivity, which remains an open challenge.



Mutual exclusivity as a challenge for deep neural networks

Neural Information Processing Systems

Strong inductive biases allow children to learn in fast and adaptable ways. Children use the mutual exclusivity (ME) bias to help disambiguate how words map to referents, assuming that if an object has one label then it does not need another. In this paper, we investigate whether or not vanilla neural architectures have an ME bias, demonstrating that they lack this learning assumption. Moreover, we show that their inductive biases are poorly matched to lifelong learning formulations of classification and translation. We demonstrate that there is a compelling case for designing task-general neural networks that learn through mutual exclusivity, which remains an open challenge.


Review for NeurIPS paper: Mutual exclusivity as a challenge for deep neural networks

Neural Information Processing Systems

The experiments assume that the models exhibit the bias if the probability of the new class(es) given a new word is one. It is not clear why it is expected for the models to assign the probability of one to the correct (new) class. When testing classification models, the correct class is the one that the model assigns the highest probability to, and this probability is often much smaller than one. This is because of the fact the sum of the prior probability over all incorrect classes is relatively large when there are many classes, even though the probability of individual classes is small. Moreover, when testing the models in a continual learning setup, the authors should continue training before the model overfits, and report the performance of the models on a held-out split.


Review for NeurIPS paper: Mutual exclusivity as a challenge for deep neural networks

Neural Information Processing Systems

The paper received mixed reviews from four reviewers. All the reviewers generally agree the paper is interesting and exposes an interesting research direction, which comes naturally to humans, but is currently lacking in most modern machine learning systems today. The main concerns raised by the reviewers are due to synthetic data and a missing concrete proposal for how to incorporate mutual exclusivity into the model as an inductive bias. The AC believes synthetic data is not sufficient reason for rejection because ultimately machine learning systems need to work on all cases. Several other minor concerns are also raised (architecture search, no other model has ME), but those reasons are minor and not sufficient weaknesses directly related to the contribution.


Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies

Baugh, Kexin Gu, Dickens, Luke, Russo, Alessandra

arXiv.org Artificial Intelligence

Although deep reinforcement learning has been shown to be effective, the model's black-box nature presents barriers to direct policy interpretation. To address this problem, we propose a neuro-symbolic approach called neural DNF-MT for end-to-end policy learning. The differentiable nature of the neural DNF-MT model enables the use of deep actor-critic algorithms for training. At the same time, its architecture is designed so that trained models can be directly translated into interpretable policies expressed as standard (bivalent or probabilistic) logic programs. Moreover, additional layers can be included to extract abstract features from complex observations, acting as a form of predicate invention. The logic representations are highly interpretable, and we show how the bivalent representations of deterministic policies can be edited and incorporated back into a neural model, facilitating manual intervention and adaptation of learned policies. We evaluate our approach on a range of tasks requiring learning deterministic or stochastic behaviours from various forms of observations. Our empirical results show that our neural DNF-MT model performs at the level of competing black-box methods whilst providing interpretable policies.


Mutual exclusivity as a challenge for deep neural networks

Neural Information Processing Systems

Strong inductive biases allow children to learn in fast and adaptable ways. Children use the mutual exclusivity (ME) bias to help disambiguate how words map to referents, assuming that if an object has one label then it does not need another. In this paper, we investigate whether or not vanilla neural architectures have an ME bias, demonstrating that they lack this learning assumption. Moreover, we show that their inductive biases are poorly matched to lifelong learning formulations of classification and translation. We demonstrate that there is a compelling case for designing task-general neural networks that learn through mutual exclusivity, which remains an open challenge.


A Bayesian Framework for Cross-Situational Word-Learning

Neural Information Processing Systems

For infants, early word learning is a chicken-and-egg problem. One way to learn a word is to observe that it co-occurs with a particular referent across different situations. Another way is to use the social context of an utterance to infer the in- tended referent of a word. Here we present a Bayesian model of cross-situational word learning, and an extension of this model that also learns which social cues are relevant to determining reference. We test our model on a small corpus of mother-infant interaction and find it performs better than competing models. Fi- nally, we show that our model accounts for experimental phenomena including mutual exclusivity, fast-mapping, and generalization from social cues.


Learning to Coordinate with Humans using Action Features

Ma, Mingwei, Liu, Jizhou, Sokota, Samuel, Kleiman-Weiner, Max, Foerster, Jakob

arXiv.org Artificial Intelligence

An unaddressed challenge in human-AI coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations. Humans take advantage of these relationships in highly intuitive ways. For instance, in the absence of a shared language, we might point to the object we desire or hold up our fingers to indicate how many objects we want. To address this challenge, we investigate the effect of network architecture on the propensity of learning algorithms to exploit these semantic relationships. Across a procedurally generated coordination task, we find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for zero-shot coordination. Through fine-grained evaluation and scenario analysis, we show that the resulting policies are human-interpretable. Moreover, such agents coordinate with people without training on any human data.